perm filename DBL7.TEX[PEG,DBL]1 blob sn#478280 filedate 1979-10-01 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00023 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	\input cmpdes
C00004 00003	\chapbegin{7}		% Here beginneth Chapter 7
C00008 00004	\SSEC{Judging Performance}
C00015 00005	 \SSSEC{{\AM}'s Ultimate Discoveries}
C00022 00006	 \SSSEC{The Magnitude of {\AM}'s Progress}
C00026 00007	 \SSSEC{The Quality of {\AM}'s Route}
C00041 00008	 \SSSEC{The Character of the User-System Interactions}
C00048 00009	 \SSSEC{{\AM}'s Intuitive Powers}
C00054 00010	 \SSSEC{Experiments on {\AM}}
C00059 00011	 \SSSEC{How to Perform Experiments on {\AM}}
C00065 00012	 \SSSEC{Future Implications of this Project}
C00077 00013	 \SSSEC{Open Problems: Suggestions for Future Research}
C00092 00014	 \SSSEC{Comparison to Other Systems}
C00106 00015	\SSEC{Capabilities and Limitations of {\AM}}
C00108 00016	 \SSSEC{Current Abilities}
C00118 00017	 \SSSEC{Current Limitations}
C00126 00018	 \SSSEC{Limitations of the Agenda scheme}
C00137 00019	 \SSSEC{Limiting Assumptions}
C00145 00020	 \SSSEC{Choice of Domain}
C00154 00021	 \SSSEC{Limitations of the Model of Math Research}
C00162 00022	 \SSSEC{Ultimate powers and weaknesses}
C00167 00023	\SSEC{Final Conclusions}
C00170 ENDMK
C⊗;
\input cmpdes
\input ammacs
\def\draft{T}
\titlepage
\newpage
\setcount0 123
\parttitle{AM: DISCOVERY IN MATHEMATICS AS HEURISTIC SEARCH}
\baselineskip 12pt
\chapbegin{7}		% Here beginneth Chapter 7
\chaptertitle{EVALUATING AM}
\rjustline{\:B Evaluating AM}
\runningrighthead{INTRODUCTION}
\tenpoint
\vskip 14pc

\epigraph{All mathematicians are wrong at times.}
\author{Maxwell}
\epigraphend

\noindent This chapter contains discussions ``meta'' to {\AM} itself.

\yskip

First comes an essay about judging the performance of a system like {\AM}.
This is a very hard task, since {\AM} has no ``goal.'' Even using current mathematical
standards, should {\AM} be judged on what it produced, or the quality of the
path which led to those results, or the difference between what it started with
and what it finally derived?

Section 7.2 then deals with the capabilities and limitations of {\AM}:


\noindent $\bullet $  What concepts can be elicited from {\AM} now? 
With a little tuning/tiny additions?

\noindent $\bullet $  What are some notable omissions in {\AM}'s behavior? Can the user elicit these?

\noindent $\bullet $ 
 What could probably be done within a couple months of modifications?

\noindent $\bullet $ 
   Aside from a total change of domain, what kinds of activities does {\AM} lack
   (\eg, proof capabilities)? Are any discoveries (\eg, analytic function theory)
   clearly beyond its design limitations?

\yskip

Finally, all the conclusions will be gathered together, and a short summary of 
this project's `contribution to knowledge' will be tolerated.

\SSEC{Judging Performance}

One may view {\AM}'s activity  as a progression from an initial core of knowledge
to a more sophisticated 
body of concepts and their facets.
Then each of the following is a reasonable way to measure success, 
to ``judge'' {\AM}:

\BN % This does a ''Setcount8 0'' to initialize the counter

\yskip

\hh  By {\AM}'s ultimate achievements. Examine the list of 
concepts and methods {\AM} developed.
Did {\AM} ever discover anything interesting yet unknown to the user?\ffootnote{The
``user'' is a human who works with {\AM} interactively, giving it hints, commands,
questions, etc.
Notice that by ``new'' we mean new to the user, not new to Mankind. 
This might occur if the user were a child, and {\AM} discovered
some elementary facts of arithmetic.
This is not really
so provincial:  mathematicians take ``new'' to mean new to Mankind, not
new in the Universe.  I feel philosophy slipping in, so this footnote is
terminated. } Anything new to Mankind?

\hh  By the character of the difference between the initial and final states.
Progressing from set theory to number theory is much more impressive than progressing
from two-dimensional geometry to three-dimensional geometry.

\hh  By the quality of the route {\AM} took to accomplish these advances:  
How clever, how circuitous,
how many of the detours were quickly identified as such and abandoned?


\hh  By the character of the User--System interactions: How important is the user's
guidance? How closely must he guide {\AM}? What happens if he doesn't say anything ever?
When he does want to say something, is there an easy way to express that to {\AM},
and does {\AM} respond well to it?
Given a reasonable kick in the right direction, can {\AM} develop the mini-theories
which the user intended, or at least something equally interesting?

\hh  By its intuitive heuristic powers: Does {\AM} believe in ``reasonable'' conjectures?
How accurately does {\AM} estimate the difficulty of tasks it
is considering?  
Does {\AM} tie together (\eg, as analogous) concepts which are formally unrelated
yet which benefit from such a tie?

\hh  By the results of the experiments on the system (described in
Section 6.2.)
How ``tuned'' is the worth numbering scheme? The task priority rating scheme?
How fragile is the set of initial concepts and heuristic  rules?
How domain-specific are those heuristics really? The set of facets?

\hh  By the very fact that the kinds of experiments outlined in Section 
6.2 can
easily be ``set up'' and performed on {\AM}.
Regardless of the experiments' outcomes, 
the features of {\AM} which allow them to be carried
out at all are worthy of note.

\hh  By the implications of this project. What can {\AM} suggest about educating
young mathematicians (and scientists in general)?
What can {\AM} say about doing math? about empirical research in general?

\hh  By the number of new avenues for research and experimentation it opens up.
What new projects can we propose?

\hh  By comparisons to other, similar systems.

\yyskip

\noindent For each of these \count8 measuring criteria, 
a subsection will now be provided.
Other measures of judging performance exist\footnote{For
example, Ken Colby sent transcripts of a session with PARRY to various
psychiatrists, and had them evaluate each interaction along several dimensions.
The same kind of survey could be done for {\AM}.
A quite separate measure of {\AM} would be to wait and see how many
future articles in the field refer to this work (and in what light!). 
}, of course, but haven't been
applied to {\AM}.

 \SSSEC{{\AM}'s Ultimate Discoveries}

Two of the ideas which {\AM} proposed were totally new and
unexpected:\footnote{Note
that these are ``ultimate discoveries'' only in the sense of what has been done
at the time of writing this book. For one of {\AM}'s ideas to be ``new,''
it should be previously unknown to both the author and the user. Why?
If the author knew about it, then the heuristics he provided {\AM} with
might unconsciously
encode a path to that knowledge. If the user knew about that idea, his
guidance might unconsciously help {\AM} to derive it. An even more stringent
interpretation would be that the idea be hitherto unknown to the collective
written record of Mathematics. }

\yskip

\noindent $\bullet $  Consider numbers  with an abnormally  high number of  
divisors. If
d(n)  represents  the   number  of  divisors  of  n,\footnote{For example,  d(12)  =
Size($\{$1,2,3,4,6,12$\}$)  =   6.   }  then   {\AM}   defines  the   set   of
``maximally-divisible numbers''  to 
be $\{n\epsilon\,N\  | \ (∀m<n)\  d(m)<d(n)\}$.
By factoring each such number into primes, {\AM} noticed a regularity in
them. The author then developed a ``mini-theory''  about these numbers.
It  later turned out  that Ramanujan  had already proposed  that very
same definition (in 1915),  and had found  that same regularity.  His
results only partially overlap those of {\AM} and the author, however, and
his methods are radically different.

\noindent $\bullet $  {\AM} found a cute geometric  application of 
Goldbach's conjecture.
Given a set of all angles of prime degree, from 0 to 180$\deg $,\footnote{Included
are 0$\deg $ and 1$\deg $, as well as the ``typical'' primes 2$\deg $, 
3$\deg $,  5$\deg $, 7$\deg $,
11$\deg ,\ldots$, 179$\deg $. }  then \4any\0 angle between 0 and  180$\deg $ 
can
be  approximated to within 1$\deg $  by adding a pair  of angles from this
prime set.  In  fact, it is hard to  find smaller sets than  this one
which approximate any angle to that accuracy.

\yskip

By  and large,  the  other concepts  which {\AM}  developed  were either
already-known, or real losers. For example,
{\AM} composed Set-insert  with the predicate  Equality.
The   result   was  an   operation
Insert$\circ$Equal(x,y,z), which first tested whether x was Equal to y or
not. The value of this was either True or False.
Next,
this T/F value    was    inserted    into    z.        For   example,
Insert$\circ$Equal($\{$1,2$\}$,$\{$3,4$\}$,$\{$5,6$\}$) = $\{$False,5,6$\}$.
The    first    two
arguments are  not equal, so the  atom `False' was  inserted into the
third.  Although hitherto ``unknown,'' this operation would clearly be better off
left in that status.

Another  kind  of  loser  occurred  whenever  {\AM}  entered  upon  some
``regular'' behavior.   For  example, if  it decided  that Compose  was
interesting, it might try to create some examples of compositions. It
could do  this by  picking two  operations and  composing them.  What
better  operations   to  pick  than   Compose  and  Compose!     Thus
Compose$\circ$Compose  would be born.  By composing that  with itself, an
even      more       monstrous       operation      is       spawned:
Compose$\circ$Compose$\circ$Compose$\circ$Compose.   Since  {\AM} actually  uses the
word ``Compose'' instead of that little infix circle, the PN{\AM}E of  the
data  structure  it   creates  is  horrendous.  Its  use   is  almost
nonexistent: it must take 5 operations as arguments, and it returns a
new operation which is the composition of those five.

An analogous danger which exists is for {\AM} to be content conjecturing
a stream of very similar relationships (\eg, the multiplication table).
In all such cases, {\AM} must have meta-rules which pull it up out of such
whirlpools, to perceive a higher generalization of its previous sequence
of related activities.

In summary, then, we may say that {\AM} produced a few winning ideas new  to the
author,  a  couple  of  which   were  new  to  Mankind. 
Several  additional  ``new''  concepts  were
created which both {\AM} and the user agreed were better forgotten.  

 \SSSEC{The Magnitude of {\AM}'s Progress}

\qq{Even with men of genius, with whom the birth rate of hypotheses
is very high, it only just manages to exceed the death rate.}{W. H. George}

We can ask the  following kind of question: how many  ``levels'' did {\AM}
ascend? This is a  fuzzy notion, but basically  we shall say
that a new level  is reached when a  valuable new bunch of  connected
concepts are defined in terms of concepts at a lower level.

For example,  {\AM} started out  knowing about sets  and set operations.
When  it progressed to numbers and arithmetic,  that was one big step
up to a new level. When it zeroed in on primes, unique-factorization,
and divisibility, it had moved up another level.

When  fed simple geometry  concepts, {\AM}  moved up  one level  when it
defined some  generalizations of  the equality  of geometric  figures
(parallel lines,  congruent and  similar triangles,  angles equal  in
measure) and their invariants (rotations, translations, reflections).

The above few  examples are unfortunately exhaustive: that just about
sums up the major advances {\AM}  made.  Its progress was halted not  so
much  by cpu  time  and space,  as by  a  paucity of  meta-knowledge:
heuristic  rules  for  filling  in new  heuristic  rules.    Thus {\AM}'s
successes  are  finite,   and  its  failures  infinite,   along  this
dimension.

A more charitable view might  compare {\AM} to a human who was forced to
start from set  theory, with  {\AM}'s sparse abilities.  In that  sense,
perhaps, {\AM} would rate quite well. The  ``unfair'' advantage it had was
the  presence of many  heuristics which themselves  were gleaned from
mathematicians: \ie,  they  are  like compiled  hindsight.  A  major
purpose  of mathematics  education in  the university  is to  instill
these heuristics into the minds of the students.

{\AM} is  thus characterized as possessing heuristics which are powerful
enough to take it a  few ``levels'' away from the kind of  knowledge it
began with, but  \4only\0 a few levels. The  limiting factors are (i)
the heuristic rules {\AM}  begins with, and  more specifically (ii)  the
expertise  in recognizing  and  compiling  new heuristics,  and  more
generally  (iii) a  lack of  real-world situations  to draw  upon for
analogies, intuitions, and applications.

 \SSSEC{The Quality of {\AM}'s Route}

\qq{Thinking is not measured by what is produced, but rather is a property of
the way something is done.}{Hamming}


No  matter what  its  achievements  were,  or the  magnitude  of  its
advancement  from   initial  knowledge,  {\AM}  could\footnote{Not
necessarily {\sl WOULD } be so judged. 
Humans may very well consider an incredible
number of silly ideas before the right pair of ``hooked atoms'' collide into a
sensible thought, which is then considered in full consciousness.
If, like humans, {\AM} was capable of doing this processing in a sufficiently brief period of
real time, it would not reflect ill on its evaluation. Of course, this may simply
be the {\sl definition} of ``sufficiently brief.''} still  be  judged
``unintelligent'' if, \eg,  it were exploring  vast numbers of  absurd
avenues for each worthwhile one it found. The quality of the route {\AM}
followed is thus quite significant.

Of the two hundred
new concepts  it defined, about 130  were acceptable---in the sense
that one can defend {\AM}'s reasoning in at least exploring them; in the
sense that a human mathematician might have considered them. Of these
``winners,''  about  two dozen  were  significant---that  is, useful,
catalytic, well-known by human  mathematicians, etc.   Unfortunately,
the sixty or seventy concepts which were losers were \4real\0 losers:
the set  of
even  primes, the  set of numbers  with only  one divisor, etc.   

Once again  we  must observe  that  the quality  of  the route  is  a
function of the quality of the heuristics.   
If there are many clever
little  rules, then  the steps  {\AM} takes will  often seem  clever and
sophisticated.  If the rules superimpose nicely, joining  together to
collectively   buttress   some    specific   activity,   then   their
effectiveness may surprise---and surpass---their creator.

Such  moments of great insight (\ie,  where {\AM}'s reasoning surpassed
mine) did occur, although  rarely. Both of {\AM}'s  ``big discoveries''
started by its examining  concepts I felt weren't really interesting.
For example, I didn't  like {\AM} spending so  much time worrying  about
numbers with  many divisors; I  ``knew'' that  the converse concept  of
primes was infinitely more valuable. And yet {\AM} saw no reason to give
up on maximally-divisible  numbers; it had  several good reasons  for
continuing that inquiry  (they were the converse to  primes which had
already  proved interesting, their frequency  within the integers was
neither very high nor very low nor very regular, their definition was
simple,   they   were   extremals   of  the   interesting   operation
``Divisors-of,''  etc.,  etc.)  Similarly,  I  ``knew''  that  Goldbach's
conjecture was useless, so I was unhappy that {\AM} was bothering to try
to apply  it in the domain of geometry.   In both cases, {\AM}'s reasons
for its actions were unassailable,  and in fact it did discover  some
interesting new ideas both times.

Sometimes  {\AM}'s  behavior  was  displeasing, even  though  it  wasn't
``erring.''     Occasionally  it   was  simultaneously  developing  two
mini-theories (say primes  and maximally-divisibles).  Then  it might
pick a task  or two dealing with one of these  topics, then a task or
two dealing with the other topic, etc. The task picked at each moment
would be the  one with the  highest priority value.   As a  theory is
developed,  the interestingness  of its  associated  tasks go  up and
down; there may be doldrums for  a bit, just before falling into  the
track that  will lead  to the discovery  of a  valuable relationship.
During  these temporary lags, the interest  value of tasks related to
the other  theory's concepts  will appear  to have  a higher  priority
value: \ie, better reasons supporting it. So {\AM} would then skip over
to one of \4those\0 concepts, develop it until \4its\0 doldrums, then
return  to the  first one,  etc.   Most  humans  found this  behavior
unpalatable\footnote{Although  it  might  be  the  ``best''  from  a  dynamic
management point of view, it probably would be wrong in the long run.
Major advances really  do have lulls in their  development. } because
{\AM}  had  no compunction  about skipping  from  one topic  to another.
Humans have to retune their minds to do this  skipping, and therefore
treat it much more seriously.  For that reason, {\AM} was given an extra
mobile reason  to use  for certain  tasks on  its agenda:  ``focus  of
attention.'' Any  task with the  same kind of  topic as the  ones just
executed  are given this extra  reason, and it  raises their priority
values a little.  This was  enough sometimes to keep {\AM} working on  a
certain mini-theory  when it  otherwise would have  skipped somewhere
else.

The  above ``defect'' is  a cute  little kind of  behavior {\AM} exhibited
which was non-human but not clearly ``wrong.''   There were \4genuine\0
bad moments  also, of course.  For example, {\AM}  became very
excited\footnote{Please excuse this anthropomorphism.  Technically, we may say  that
the priority value  of the best  job on the  agenda is the  ``level of
excitement'' of  {\AM}. 700 or higher is called ``excitement,'' on a scale of
0-1000.} when the conjunction of ``empty-set'' and other concepts
kept being equal to empty-set. {\AM} kept repeating conjunctions of this
form,  rather than  stepping back and  generalizing this  data into a
(phenomenological)  conjecture.    Similar  blind   looping  behavior
occurred when  {\AM} kept composing Compose with  itself, over and over.
In general, one could say that ``regular'' behavior of any kind signals
a probable  fiasco. A heuristic  rule to this  effect halted  most of
these disgraceful antics.   This rule had to be careful, since it was
almost the antithesis of the  ``focus of attention'' idea mentioned  in
the preceding paragraph.  Together, those  two rules seem to say that
you  should continue on with  the kind of thing  you were just doing,
but not for \4too\0 long a time.

\yskip

{\AM}  has   very  few  heuristics  for  deciding   that  something  was
\4un\0interesting, that  work on  it  should halt  for a  long  time.
Rather, {\AM} simply won't have anything  positive to say about that
concept, and other concepts are explored instead, essentially by
default.  Each  concept has a worth  component which corresponded  to
its right to life (its right  to occupy storage in core). This number
slowly   declines  with  time,  and   is  raised  whenever  something
interesting happens  with that  concept.  If  it ever  falls below  a
certain  threshhold, and  if space  is exhausted,\footnote{No  concepts were
forgotten in this way until near the end of {\AM}'s runs, when {\AM}  would
usually collapse  from several causes  including lack  of space.} then
the concept  is forgotten: its list cells are garbage collected,
and all references to  it are erased, save  those which will keep  it
from being re-created.  This  again is not purposeful forgetting, but
rather  by default; not because  X is seen as  a dead-end, but simply
because other concepts seem so much more interesting for a long time.

Thus 
{\AM} did not develop  the sixty ``losers'' very much: they  ended up
with an average  of only 1.5 tasks relevant to  them ever having been
chosen.   The ``winners''  averaged  about twice  as many  tasks, which
helped fill  them out more.   Also, the  worth ratings of  the losers
ended up  far below those  of the  winners in all but two cases.

The final aspect  of this  important dimension of  evaluation is  the
quality of the reasons {\AM} used to  support each task it chose to work
on.   
The
{\it Zeitgeist} of having expert systems possess self-explanatory
capabilities is reflected in both the systems described in this
book. 
{\AM}'s English  phrases corresponded  quite nicely  to the
``real'' reasons a human might give to justify why  something was worth
trying, and  the ordering of the  tasks on the agenda  was rarely far
off from the one  that I would have  picked myself. 

 \SSSEC{The Character of the User-System Interactions}

{\AM}  is   not  a   ``user-oriented''  system.   There  were  many   nice
human-interaction features  in the original grandiose proposal for {\AM}
which never got off the drawing board. At the heart of these features
were two assumptions:

\yskip

\noindent $\bullet $   The  user must  understand {\AM},  and 
{\AM} must  likewise have  a good
model of  the particular human using {\AM}. The only time either should
initiate a  message is when  his model of  the other  is not what  he
wants  that  model to  be.    In that  case,  the  message should  be
specifically designed  to  fix  that  discrepancy.\footnote{This  idea  was
motivated by a lecture given in 1975 by Terry Winograd.}

\noindent $\bullet $ Each  kind of message  which is  to pass between  
{\AM} and its  user
should  have its own  appropriate language.   Thus there  should be a
terse comment language, whereby the user can note how he  feels about
what {\AM} is doing, a questioning language for either party to get/give
reasons  to the other,  a picture language  for communicating certain
relationships, etc.

\yskip

Neither of these ideas  ever made it into  the LISP code that  is now
{\AM}, although  they are  certainly not prohibited  in any way  by {\AM}'s
design. 

As one  might expect, the reason  for this atrophy is  simply because
very  little guidance from the  user was needed by {\AM}.   In fact, all
the discoveries, cpu  time quotes, etc.   mentioned in this  document
are taken  from totally unguided  runs by {\AM}.  If the user  guides as
well as he  can, then about  a factor  of 2 or 3 speedup  is possible.  Of
course, this assumes  that the user  is dragging {\AM} directly  along a
line of development he knows will be successful. The user's ``reasons''
at each step are based essentially on hindsight. Thus this is  not at
all ``fair.'' If  {\AM} ever becomes more user-oriented, it  would be nice
to  let children (say 6-12 years old)  experiment with it, to observe
them working  with  it in  domains unfamiliar  to  either of
them.\footnote{Starred (*)  exercise for the reader:  carry out such a  project on a
statistically significant sample of children, wait thirty years,  and
observe the  incidence of mathematicians  and scientists  in general,
compared to the national averages. Within whatever occupation they've
chosen, rate their creativity and productivity. }

The user  can  ``kick''  {\AM}  in  one direction  or  another,  \eg,  by
interrupting  and telling  {\AM}  that Sets  are  more interesting  than
Numbers.
  Even in  that particular case,  {\AM} fails to
develop any  higher-level  set  concepts  (diagonalization,  infinite
sets,  etc.) and  simply  wallows  around in  finite  set theory  (de
Morgan's  laws,  associativity  of  Union,  etc.).    When  geometric
concepts are  input, and  {\AM} is  kicked in  \4that\0 direction,  much
nicer results are obtained. See the report on the 
Geometry experiment, in section 6.2.5.

There is one important result to  observe: the very best examples  of
{\AM} in action were brought to full fruition only by a human developer.
That is, {\AM}  thought of a couple great concepts, but couldn't develop
them well on its own. A human (the author) then took them  and worked
on them by hand, and interesting results were achieved. These results
could be  told to {\AM}, who could then go off and look for new concepts
to investigate.    This interaction  is  of course  at a  much  lower
frequency than the  kind of rapidfire question/answering talked about
above. Yet it  seems that such  synergy may be  the ultimate mode  of
{\AM}-like systems: creative {\it assistants} to experts.

 \SSSEC{{\AM}'s Intuitive Powers}

\qq{Intuitive conviction surpasses logic as the brilliance of the sun
surpasses the pale light of the moon.}{Kline}


Let  me  hasten  to  mention  that  the   word  ``intuitive''  in  this
subsection's  title is  not related  to the  (currently non-existent)
``Intuitions'' facets of the concepts.   What is meant is the  totality
of  plausible reasoning  which  {\AM} engages  in: empirical  induction,
generalization,  specialization, maintaining reasons  for jobs on the
agenda list, creation of analogies between bunches of concepts, etc.

{\AM} only considers  conjectures which have been  explicitly suggested:
either by empirical evidence, by analogy, or (de-implemented now:) by
Intuition facets.   Once  a  conjecture has  been formulated,  it  is
tested in  all  ways possible:  new experimental  evidence is  sought
(especially extreme cases), it is examined formally\footnote{Currently, this
is done in trivial ways. An open problem, which is under  attack now,
is to add more  powerful formal reasoning abilities to {\AM}.   } to see
if it follows from already-known conjectures, etc.

Because  of this grounding in plausibility,  the only conjectures the
user ever sees  (the ones {\AM}  is testing) are  quite believable.   If
they  turn out  to  be false,  both he  and  {\AM} are  surprised.   For
example, both {\AM} and the user were disappointed when nothing came out
of the concept  of Uniquely-prime-addable numbers  (positive integers
which  can be represented as  the sum of two  primes in precisely one
way).   Several conjectures  were proposed  via  analogy with  unique
prime factorization,  but none of  them held experimentally.  Each of
them seemed worth investigating, to both the user and the system.
It is still not known whether there is anything interesting about that concept
or not. 

{\AM}'s  estimates of the value of each  task it attempts were often far
off from what hindsight proved their true values to  be. Yet this was
not so different  from the situation a real  researcher faces, and it
made little difference on the discoveries and failures of the system.
{\AM}  occasionally mismanaged  its  resources due  to  errors in  these
estimates.   To  correct for  such erroneous  prejudgments, heuristic
rules were permitted to  dynamically alter the time/space quanta  for
the current task. If some interesting new result turned up, then some
extra resources would be allotted. If certain heuristics failed, they
could reduce the time limits, so not as much total  cpu time would be
wasted on this loser.

Some poor analogies  were considered, like the  one between bags  and
singleton-bags.   The  ramifications of  this analogy  were painfully
trivial.\footnote{The bag-operations, applied to singletons, did not produce
singletons  as  their  result:  (x)$\union $(y)  is  (x,y)  which  is  not  a
singleton. Whether a bag-operation did produce a singleton bag or not 
depended only  on the equality or
inequality of the two  arguments.  There  were many tiny  conjectures
proposed which merely re-echoed this general conclusion. }

 \SSSEC{Experiments on {\AM}}


The experiments  described in  Section 6.2
provide  some results relevant to the overall value
of the  {\AM}  system.   The  reader  should consult  that  section  for
details; neither the  experiments nor their results  will be repeated
here.   A few conclusions  will be summarized:

The worth-numbering scheme  for the concepts  is fairly robust:  even
when all the concepts's worths are initialized at the same value, the
performance  of  {\AM}  doesn't  collapse,  although  it  is   noticeably
degraded.

Certain mutilations of the priority-value scheme  for tasks on the agenda will
cripple  {\AM}, but  it can resist  most of  the small changes  tried in
various experiments.

Sometimes, removing  just  a  single concept  (\eg,  Equality)  was
enough  to  block  {\AM}  from discovering  some  valuable  concepts  it
otherwise got (in this case, Numbers). This makes {\AM}'s behavior sound
very fragile, like a slender chain of advancement.   But on the other
hand,  many concepts  (\eg,  TIMES, Timber\-line, Primes\footnote{Primes
was discovered independently as follows: all numbers (>0) were seen
to be representable as the sum of smaller numbers; Add was known to be
analogous to TIMES; But not all numbers (>1) appeared to be representable as the
{\sl product} of two smaller ones; Rule number 166 triggered (see Appendix
2), and {\AM} defined the set of exceptions: the set
of numbers which could not be expressed as the product of two smaller ones; \ie,
the primes. })  were  discovered in
several independent  ways.   If  {\AM}'s  behavior  {\sl is} a  chain,  it  is
multiply-stranded (except for a few weak spots, like ``Numbers'').

The heuristics are specific to their stated domain  of applicability.
Thus when working in geometry,  the Operation heuristics were just as
useful as they were when {\AM} worked in elementary set theory or number
theory. The set of facets seemed adequate for those domains, too. The
Intuition facet, which was  rejected as a valid source of information
about sets and numbers, might  have been more acceptable in  geometry
(\eg,  something  similar  to  Gelernter's   model  of  a  geometric
situation).

 \SSSEC{How to Perform Experiments on {\AM}}


The very  fact that the  kinds of experiments  mentioned in  the last
section (and  described in detail in Section 6.2)
can be ``set up'' and  performed on {\AM}, reflects an important  quality
of the {\AM} program.

Most of those experiments  took only a matter of minutes to  set up, only a
few  tiny modifications to  {\AM}.  For  example, the one  where all the
Worth  ratings  were  initialized  to  the  same  value  was  done  by
evaluating the single LISP expression:

\han6{{\6 (MAPC CONCEPTS '($\lambda $ (c) (PUT c 'Worth 200)))}}

\yskip

Similarly, here is how {\AM}  was modified to treat all tasks as if they
had equal value: the function Pick-task has a statement of the form

\han6{{\6 (SETQ Current-task (First-member-of Agenda))}}

\noindent All that  was necessary  was  to replace  the  call on  the  function
``\6First-member-of\1''
by the function ``\6Random-member-of\1.''

\yskip

Even  the most  sophisticated experiment, the  introduction of  a new
bunch of  concepts---those  dealing with  geometric  notions  like
Between, Angle,  Line---took only  a day of conscious work to
set up.

Of course \4running\0 the experiment involves the expenditure of hours of
cpu time.

There  are  certain  experiments  one  can't  easily  perform on  {\AM}:
removing all  its heuristics,  for  example.   Most heuristic  search
programs  would then  wallow around,  displaying just  how  big their
search space really was. But {\AM} would just sit there, since it'd have
nothing plausible to do.  {\AM}'s heuristics are {\sl plausible move
generators}, not {\sl implausible move constrainers}.

Many other  experiments, while  cute and  easy to  set up, are  quite
costly in terms of cpu time. For example, the class of experiments of
the form: ``remove heuristics x,  y, and z, and observe the  resultant
affect on {\AM}'s  behavior.''  This observation would  entail running {\AM}
for an hour or two of cpu time!  Considering the number of subsets of
heuristics, not all these questions are going to get  answered in our
universe's lifetime.  Considering  the small probable payoff from any
one such experiment, very few should actually be attempted.

One nice experiment would be to monitor the contribution each heuristic
is making. That is, record each time it is used  and record the final outcome
of its activation (which may be several cycles later). 
Unfortunately,
{\AM}'s heuristics are not all coded as separate Lisp entities, which one could
then ``trace.'' Rather, they are often interwoven with each other into large
program pieces. So this experiment can't be easily set up and run on {\AM}.

Even those experiments which can be run, can be set up 
only by someone familiar  with the LISP code of {\AM}.   It would be
quite  hard to  modify {\AM}  so  that the  untrained user  could easily
perform these  experiments. Essentially,  that would  demand that  {\AM}
have a  deep understanding of  its own  structure. This is  of course
desirable, fascinating, challenging, but wasn't part of the design of
{\AM}.\footnote{A general suggestion  for future research projects in this area:
such  systems  should  be  designed  in a  way  which  facilitates  a
poorly-trained   user   not   only    {\sl using}   the   system    but
{\sl experimenting} on it. }

 \SSSEC{Future Implications of this Project}

One harsh measure of {\AM} would be to demand what possible applications
it will have. This  really means (i) the uses for the {\AM} system, (ii)
the  uses for  the  ideas  of  how  to  create  such  systems,  (iii)
conclusions about math and science one can draw from experiments with
{\AM}.

Here are some of these implications, both real and potential:
applications.>

\yskip

\BN

\hh   New tools  for  computer  scientists  who want  to  create  large
knowledge-based systems to emulate some creative human activity.



$\ \ \ \ \ \ $\61a.\0 The  modular representation of  knowledge that {\AM}  uses might
prove to be effective in any  knowledge-based system.  Division of  a
global problem into a multitude of small chunks,  each of them of the
form  of setting up  one quite local  ``expert'' on some  concept, is a
nice way  to make  a hard  task more  managable.   Conceivably,  each
needed expert could be  filled in by a human who  really is an expert
on that topic.  Then the global abilities of the system would be able
to rely  on quite  sophisticated local  criteria.   Fixing  a set  of
facets once and for all permits effective inter-module communication.

$\ \ \ \ \ \ $\61b.\0 Some ideas may carry over unchanged into many fields of human
creativity, wherever local  guiding rules exist.  These include:  (a)
ideas  about heuristics  having  domains  of applicability,  (b)  the
policy  of  tacking  them  onto  the  most general  knowledge  source
(concept, module)  they are relevant  to, (c)  the rippling  scheme to
locate relevant knowledge, etc.,


\hh  A body of heuristics which can be built upon by others.


$\ \ \ \ \ \ $\62a.\0  Most of  the  particular heuristic  judgmental criteria  for
interestingness,   utility,  etc.,  might   be  valid  in  developing
theorizers in other sciences.   Recall that each rule has  its domain
of applicability; many of the heuristics in {\AM} are quite general.

$\ \ \ \ \ \ $\62b.\0 Just within the small  domain in which {\AM} already works, this
base of heuristics  might be  enlarged through  contact with  various
mathematicians. If  they are willing  to introspect  and add some  of
their ``rules'' to {\AM}'s existing base, it might gradually grow more and
more powerful.

$\ \ \ \ \ \ $\62c.\0 Carrying this  last point  to the limit  of possibility,  one
might imagine the program possessing  more heuristics than any single
human.  Of course,  {\AM} as  it stands  now is  missing so much  of the
`human element', the life experiences that a mathematician draws upon
continually  for inspiration,  that merely  amassing  more heuristics
won't  automatically  push   it  to  the   level  of  a   super-human
intelligence.  Another  far-out   scenario  is  that  of   the  great
mathematicians of each generation pouring their individual heuristics
into an {\AM}-like system.  After a few generations have come  and gone,
running  that  program  could  be  a  valuable  way  to  bring  about
`interactions' between people who were not contemporaries.


\hh  New and better strategies for math educators. 


$\ \ \ \ \ \ $\63a.\0 Since the key to {\AM}'s success seems to be its heuristics, and
not  the particular  concepts  it  knows, the  whole  orientation  of
mathematics education should perhaps be modified, to provide experiences for
the student which will build up  these rules in his mind. Learning  a
new theorem is  worth much less  than learning a new  heuristic which
lets you  discover new theorems.\footnote{Usually. One  kind of exception is
the following: the  ability to take a  powerful theorem, and  extract
from it a new, powerful heuristic. {\AM} cannot do this, but it may turn
out  that this mechanism  is quite crucial for  humans' obtaining new
heuristics. This is another open  research problem. } I am
far from the first to urge such a revision (see, \eg, [Koestler 67], p.265,
or see [Papert 72]).


$\ \ \ \ \ \ $\63b.\0   If  the  repertoire   of  intuition  (simulated  real-world
scenarios) were sufficient for  {\AM} to develop elementary  concepts of
math, then educators should  ensure that children (4-6 years old) are
thoroughly exposed to those scenarios.  Such activities would include
seesaws,  slides,  piling  marbles into  pans  of  a  balance  scale,
comparing the heights  of towers built out of cubical blocks, solving
a jigsaw puzzle, etc.  Unfortunately, {\AM} failed to show the  value of
these few scenarios.  This was  a potential application which was not
confirmed.

$\ \ \ \ \ \ $\63c.\0 One use for {\AM} itself would be as a ``fun'' teaching tool. If a
very nice user interface
is constructed, {\AM}  could serve  as a model  for, say,
college freshmen with no  math research experience. They could  watch
{\AM}, see the kinds of things it  does, play with it, and perhaps get a
real  flavor for (and get  turned on by) doing  math research. A vast
number of brilliant minds are too turned off  by high-school drilling
and  college calculus to  stick around  long enough  to find  out how
exciting---and different---research math  is compared to  textbook
math.


\hh  Further experiments  on {\AM} might  tell us something about  how the
theory  formation task changes  as a theory  grows in sophistication.
For example, can the same methods which lead {\AM}  from premathematical
concepts  to  arithmetic also  lead  {\AM}  from  number systems  up  to
abstract  algebra?   Or are  a new  set of  heuristic rules  or extra
concepts required?    My guess  is that  a few  of  each are  lacking
currently, but \4only\0 a few.  There is a great deal of disagreement
about this  subject  among  mathematicians.   By  tracing  along  the
development of mathematics,  one might categorize discoveries  by how
easy  they would  be for  an {\AM}-like  system to  find.   Sometimes, a
discovery required the invention of a brand new heuristic rule, which
would  clearly  be  beyond  {\AM} as  currently  designed.    Sometimes,
discovery  is  based  on the  lucky  random  combination  of existing
concepts, for no good \4a priori\0 reason. It would be instructive to
find out how often this  is necessarily the case: how often \4can't\0
a mathematical discovery be motivated and ``explained'' using heuristic
rules of the kind {\AM} possesses?

\hh  An  unanticipated result was  the creation of  new-to-Mankind math
(both   directly  and  by  defining   new,  interesting  concepts  to
investigate  by hand).    The  amount  of  new  bits  of  mathematics
developed to date is minuscule.


$\ \ \ \ \ \ $\65a.\0 As described  in (2c) above, {\AM} might  absorb heuristics from
several  individuals and thereby integrate their particular insights.
This might eventually result in new mathematics being discovered.

$\ \ \ \ \ \ $\65b.\0 An even more exciting prospect, which never materialized, was
that  {\AM}  would  find  a  new  redivision of  existing  concepts,  an
alternate  formulation  of   some  established   theory,  much   like
Hamiltonian mechanics is  an alternate unification of the  data which
led  to Newtonian  mechanics.   The  only rudimentary  behavior along
these lines was when {\AM} occasionally derived a familiar concept  in an
abnormal way (\eg, TIMES was derived  in four ways; Prime pairs were
noticed by restricting Addition to primes).

 \SSSEC{Open Problems: Suggestions for Future Research}

While {\AM} can and should stand as a complete research project, part of
its value  will stem from whatever future  studies are sparked by it.
Of course the ``evaluation'' of  {\AM} along this dimension must wait  for
years, but even  at the present time several such  open problems come
to mind:

\yskip

$\bullet $  Devise  Meta-heuristics,  rules  capable  of  operating  on  and
synthesizing new heuristic rules.  {\AM} has shown the solution  of this
problem  to be  both  nontrivial and  indispensable.   {\AM}'s  progress
ground to  a  halt  because fresh,  powerful  heuristics  were  never
produced.  The next  point suggests that the same need  for new rules
exists in mathematics as a whole:

$\bullet $  Examine the  history of mathematics, and gradually build up a list
of the  heuristic rules used.   Does  the following  thesis have  any
validity:  ``{\sl The  development  of  mathematics  is  essentially  the
development  of new heuristics}.''  That is, can we  `factor out' all
the discoveries reachable by the set of  heuristics available (known)
to the mathematicians at some  time in history, and then explain each
new  big  discovery  as  requiring  the  synthesis  of  a  brand  new
heuristic?  For  example, Bolyai and  Lobachevsky did this  a century
ago when  they decided that counter-intuitive  systems might still be
consistent and interesting. Non-Euclidean  geometry resulted, and  no
mathematician today would think twice  about using the heuristic they
developed.   Einstein invented a new heuristic more recently, when he
dared to consider that counter-intuitive systems  might actually have
physical reality.\footnote{As Courant says, ``When Einstein  tried to reduce
the notion of `simultaneous events occurring at different places'  to
observable phenomena,  when he unmasked  as a  metaphysical prejudice
the  belief  that this  concept  must have  a  scientific  meaning in
itself, he had found the key to his theory of relativity.'' } What was
once a bold new method is now a standard tool in theoretical physics.

$\bullet $   In a  far less  dramatic vein,  a hard  open problem  is that  of
building  up  a  body  of  rules  for  symbolically  instantiating  a
definition  (a  LISP predicate).    These  rules  may  be  structured
hierarchically,  so that rules  specific to operating  on `operations
whose domain  and range  are equal'  may be  gathered.   Is this  set
finite and managable; \ie, does some sort  of ``closure'' occur after
a few hundred (thousand?) such rules are assembled?

$\bullet $  More generally,  we can ask for the expansion of all the heuristic
rules, of all  categories. This may  be done  by eliciting them  from
mathematicians,  or automatically  by the application  of very
sophisticated meta-heuristics.  Some categories of rules include: how
to generalize/specialize definitions, how to find examples of a given
concept, how to optimize LISP algorithms.

$\bullet $  Experiments can be done  on {\AM}. A few have been performed already,
many more are proposed  in Section 6.2, and  no
doubt some additional ones have already occurred to the reader.

$\bullet $  Extend the  analysis already begun 
of the set of heuristics {\AM}
possesses.
One reason for such an analysis would be to achieve a better understanding
of the contribution of the heuristics. 
In some sense, the heuristics and the choice of starting concepts ``encode'' 
the discoveries which {\AM} makes, and the way it makes them. A better understanding
of that encoding may lead to new ideas for {\AM} and for future {\AM}-like systems.
More generally, we know very little about ``the space of heuristics;''
the time may be ripe for research in this area.

$\bullet $  Rewrite {\AM}. In Chapter 1, it was pointed
out that there are two common species of heuristic search programs.
One type has a legal move generator, and heuristics to  constrain it.
The second type, including {\AM}, has only a set of heuristics, and they act as
plausible move generators. Since {\AM} seemed to create new concepts,
propose new conjectures, and formulate new tasks
in a very few
distinct ways, it might very well be feasible to find a purely syntactic
``legal move generator'' for {\AM}, and to convert each existing heuristic into
a form of constraint. In that case, one could, \eg, remove all the heuristics
and still see a meaningful (if explosive) activity proceed. There might be a few
surprises down that path.

$\bullet $  A more tractible project, a subset of the former one, would be to recode
just the conjecture-finding heuristics as constraints on a new, purely syntactic
``legal conjecture generator.'' A simple Generate-and-Test paradigm would be used to
synthesize and examine large numbers of conjectures. 
Again, removing all the heuristics
would be a worthwhile experiment.

$\bullet $  At the reaches of feasability, one can imagine trying to extend {\AM} into more and
more  fields, into less-formalizable  domains. International politics
has already been suggested as a very hard future applications area.

$\bullet $  Abstracting that  last point, try  to build up  a set of  criteria
which make a domain ripe for  automating (\eg, it possesses a strong
theory, it is knowledge-rich (many heuristics exist), the performance
of the professionals/experts is much better than that  of the typical
practitioners,  the new  discoveries in  that field  all fall  into a
small variety  of syntactic  formats,$\ldots $?).   Initially,  this  study
might  help  humans  build better  and  more  appropriate  scientific
discovery programs.  Someday, it might even permit the creation of an
automatic-theory-formation-program-writer.

$\bullet $  The interaction between  {\AM} and the  user is minimal and  painful.
Is there a more effective  language for communication? Should several
languages  exist,  depending  on  the  type  of  message  to be  sent
(pictures,  control  characters,   a  subset  of  natural   language,
induction  from  examples,  etc.)?  Can  {\AM}'s  output  be  raised  in
sophistication by introducing an internal  model of the user and  his
state of knowledge at each moment?  Perhaps {\AM} should also contain
models of several known \4groups\0 of users (mathematicians,
computer scientists, students)---say, stored as concepts in a
specialization/generalization hierarchy.

$\bullet $  Human protocol studies  may be appropriate, to test  out the model
of mathematical research  which {\AM} puts forward. Are the sequences of
actions similar? Are the mistakes analogous? Do the pauses  which the
humans emit \4quantitatively\0 correspond to {\AM}'s periods of gathering
and running `Suggest' heuristics?

$\bullet $   Can the idea  of Intuition  functions be developed  into a useful
mechanism?   If not, how  else might  real-world experiences be  made
available to an automated  researcher to draw upon (for analogies, to
base new theories  upon)? Could  one interface  physical effectors  and
receptors and quite  literally allow the  program to `play  around in
the real world' for his analogies?

$\bullet $   Most of the `future  implications' discussed in  the last section
suggest future  activities  (\eg, new  educational  experiments  and
techniques).

$\bullet $  Most  of the `limiting  assumptions' discussed in a  later section
(2.7.1.5) can  be tackled  with today's techniques  (plus a
great deal of effort).   Thus each of them counts as an  open problem
for research.

$\bullet $  Perform an information-theoretic analysis on {\AM}. What is the value
of each heuristic? the new information content of each new conjecture?

$\bullet $  If you're interested in natural language, the very hard problem exists of giving
{\AM} (or a similar system) the ability to really do inferential processing
on the \4reasons\0 attached to tasks on the agenda. Instead of just being
able to test for equality of two reasons, it would be much more intelligent to
be able to infer the kind of relationship between any two reasons; if they overlap
semantically, we'd like to be able to compute precisely
how that should that effect the overall rating for the task; etc.


$\bullet $  Modify the control structure of {\AM}, as follows. Allow mini-goals to exist,
and supply new rules for setting them up 
(plausible goal generators)
and altering those goals, plus some new rules
and algorithms for satisfying them. The modification I have in mind would result
in new tasks being proposed because of certain current goals, and existing tasks
would be reordered so as to raise the chance of satisfying some important goal.
Finally, the human watching {\AM} would be able to observe the rationality
(hopefully) of the goals which were set. The simple ``Focus of Attention''
mechanism already in {\AM} is a tiny step in this goal-oriented direction.
Note that this proposal itself demonstrates that {\AM} is not inherently opposed
to a goal-directed control structure. Rather, {\AM} simply possesses only a partial set of
mechanisms for complete reasoning about its domain.


 \SSSEC{Comparison to Other Systems}

One popular way to judge a system is to  compare it to other, similar
systems, and/or to  others' proposed criteria for such systems. There
is no other project (known to  the author) 
having the same objective: automated math research.\footnote{In
[Atkin & Birch 1971], \eg,
we find no mention of the computer except as a number cruncher. }
Many somewhat related efforts have been reported 
in the literature
and will be mentioned here.

Several  projects have been undertaken which  overlap small pieces of
the {\AM}  system  and in  addition concentrate  deeply  upon some  area
\4not\0  present in {\AM}.    For example,
the CLET system [Badre 73]
worked  on  learning  the  decimal  addition  algorithm\footnote{Given  the
addition table up  to 10 +  10, plus an  English text description  of
what it means to carry, how and when to carry, etc., actually write a
program  capable   of  adding   two  3-digit   numbers. }   but   the
``\4mathematics  discovery\1''  aspects of  that  system  were  neither
emphasized  nor  worth  emphasizing; it  was  an  interesting natural
language communication study.   The same  comment applies to  several
related studies by IMSSS.\footnote{See [Smith 74a], for example.}

Boyer  and Moore's  theorem-prover 
[Boyer&Moore 75]
embodies
some of the spirit of {\AM} (\eg, generalizing the definition of a LISP
function), but  its motivations  are quite  different, its  knowledge
base is  minimal, and its methods purely  formal.\footnote{This is not meant
as criticism; considering the goals of those researchers, and the age
of that system, their work is  quite significant. }  The same comments
apply to the S{\AM} program [Guard 69], in which a resolution theorem-prover is set to
work on unsolved problems in lattice theory.

Among the attempts to incorporate heuristic knowledge into a theorem
prover, we should also mention [Wang 60], 
[Pitrat 70],
[Bledsoe 71], and [Brotz 74].
How did {\AM} differ from these ``heuristic theorem-provers''?
The goal-driven control structure of these systems is a real but only minor
difference from {\AM}'s control structure
(\eg, {\AM}'s ``focus of attention'' is a rudimentary step in that direction).
The fact that their overall activity is typically labelled as deductive
is also not a fundamental distinction
(since constructing a proof is usually in practice quite 
{\sl in}ductive).\footnote{See [Lakatos 76] for a marvelous discussion of
the continual spiral of criticism and improvement, as the mathematics
community zeros in toward a proof, uncovers a refutation, finds a new
proof of a slightly modified conjecture, and so on. } 
Even the character of the inference processes are analogous: The provers typically
contain   a couple binary inference rules, like Modus Ponens, which are relatively
risky to
apply but can yield big results; {\AM}'s few  ``binary''
operators have the same characteristics: Compose,  Canonize, Logically-combine 
(disjoin and conjoin).
The main distinction is that
{\it the theorem provers each incorporate only
a handful of heuristics.}  The reason for this, in turn, is the paucity of
good heuristics which exist for the very general task environment in which they
operate: domain-independent (asemantic) predicate calculus theorem proving.
The need for additional guidance was recognized by these researchers.
For example, see [Wang 60], p. 3 and p. 17. Or as
Bledsoe says:\footnote{[Bledsoe 71], p. 73 }

\yskip

\moveright 30 pt \hbox par 270 pt{\sl There is  a real 
difference between {\it doing}  some mathematics  and
{\it being}  a  mathematician. The  difference  is  principally one  of
judgment: in the selection  of a problem (theorem  to be proved);  in
determining its relevance;...   It is  precisely in these  areas that
machine provers have been so lacking. This kind of judgment has to be
supplied by the user... Thus  a crucial part of the resolution  proof
is the {\it selection} of the reference theorems by the {\it human} user;
the  human, by this one action, usually  employs more skill than that
used by the computer in the proof.}

\yskip

Many researchers  have constructed  programs which pioneered  some of
the  techniques {\AM} uses.
[Gelernter 63] reports
the use of  prototypical examples  as analogic  models to  guide  search in
geometry, and [Bundy 73] employs models of ``sticks''  to help his program work  with
natural numbers.  The  single heuristic of analogy was studied in 
[Evans 68] and [Kling 71].
Brotz's program, [Brotz 74], uses  this to  propose useful
lemmata. 

Theory formation systems in \4any\0 field have been few. Meta-Dendral 
[Bu\-chan\-an 74]
represents perhaps the best of these.   Its task is to  unify a body of  mass
spectral data (examples of ``proper''  identifications of spectra) into
a  small body of  rules for making  identifications.   Thus even this
system  is  given  a  fixed  task,  a  fixed  set  of  data  to  find
regularities within.   {\AM}, however, must find its own  data, and take
the  responsibility for  managing its own  time, for  not looking too
long at worthless  data.\footnote{Another way of looking at that:
Meta-Dendral has a fixed set of templates for rules which
it wishes to find, and a fixed  vocabulary of mass spectral concepts which can
be plugged into those templates. {\AM} also has only a few stock formats for conjectures,
but it selectively enlarges its vocabulary of math concepts. }
There has been much written about scientific theory formation
(\eg, [Hempel 52]), but very little of it is specific enough to be of immediate
{\sl operational }
use to AI researchers. A couple pointers to excellent discussions of this sort
are: [Fogel 66],
[Simon 73], and [Buchanan 75].
Also worth noting is a discussion near the end of [Amarel 69], in which 
``formation'' and ``modelling'' problems are treated:

\yskip

\moveright 30 pt \hbox par 270 pt{\sl The problem of model finding is 
related to the following general question
raised by Schutzenberger [{\rm\ In discussion at the Con\-fer\-ence on 
Intelligence and
Intelligent Systems, Athens, Ga., 1967}]: {\it `What do we want to do with intelligent
systems that relates to the work of mathematicians?'}.
So far all we have done in this general area is to emulate some of the reasonably simple activities
of mathematicians, which is finding consequences from given assumptions, 
reasoning, proving
theorems. A certain amount of work of this type was already done in the propositional and
predicate calculi, as well as in some other mathematical systems. But this is 
only one aspect of the work that goes on in mathematics.}

\yskip

\moveright 30 pt \hbox par 270 pt{\sl 
Another very important aspect is the one of finding general properties of structures, finding
analogies, similarities, isomorphisms, and so on.
This is the type of activity  that is extremely important
for our understanding of model-finding mechanisms. Work in this
area is more difficult than theorem-proving. The problem here is that of
{\it theorem finding.}}

\yskip

{\AM} is one of the first attempts to construct a ``theorem-finding'' program. 
As Amarel noted, it may be
possible to learn from such programs how to tackle the general task of automating
scientific research.

Besides ``math systems,''  and ``creative thinking systems,'' and ``theory
formation systems,'' we  should at least  discuss others' thoughts  on
the issue of  algorithmically doing math research.   Some individuals
feel  it is  not so  far-fetched  to imagine  automating mathematical
research (\eg,  Paul Cohen).   Others (\eg,  P\'olya) would  probably
disagree.  The presence  of a  high-speed, general-purpose
symbol  manipulator
in our  midst  now  makes  investigation  of  that  question
possible.

There  has been  very  little published  thought  about discovery  in
mathematics from an  algorithmic point of  view; even clear  thinkers
like P\'olya  and  Poincar\'e treat  mathematical ability  as a  sacred,
almost  mystic quality,  tied to  the unconscious.   The  writings of
philosophers and psychologists  invariably attempt  to examine  human
performance and belief, which are far more managable than creativity
\4in  vitro\1.   Belief  formulae  in inductive  logic\footnote{For example, see
[Hintikka 62], [Pietarinin 72].
The latter also contains a good summary of Carnap's  $\lambda ,\,\alpha$ 
formalization. }
invariably fall  back upon how  well they  fit
human measurements.  The abilities  of a computer and a brain are too
distinct  to  consider  blindly   working  for  results  (let   alone
algorithms!) one possesses which match those of the other.

\SSEC{Capabilities and Limitations of {\AM}}


The first two subsections contain a general discussion of what {\AM} can and
can't do.  Later subsections deal with  powers and limitations inherent in using
an agenda scheme, 
in fixing the domain of {\AM}, and in picking one specific
model of math research to build {\AM} upon.
The {\AM} 
program exists only because a great many simplifying assumptions were tolerated;  
these
are discussed
in Section 7.2.6.
Finally, some speculation is made  about the ultimate
powers and weaknesses of any systems which are designed very much like {\AM}.

 \SSSEC{Current Abilities}

\4What fields has {\AM} worked in so far?\0 {\AM}  is now able to explore a
small  bit of  the theory  of  sets, data  types, numbers,  and plane
geometry.  It by no means has been fed---nor has it
rediscovered---a
large fraction of what is known in any of those fields. It might be
more  accurate to be humble and  restate those domains as: elementary
finite set  theory, trivial  observations  about four  kinds of  data
types,  arithmetic  and elementary  divisibility  theory, and  simple
relationships  between   lines,   angles,   and   triangles.   So   a
sophisticated concept in each domain---which was discovered by {\AM}---might
be: 

\han6{$\bullet $ \ de  Morgan's laws}

\han6{$\bullet $ \ the fact that Delete$\circ$Insert\footnote{Take
an item x, insert it into (the front of) structure B, then delete one 
(the first) occurrence of x from B. } never alters  Bags or
Lists}

\han6{$\bullet $ \ unique factorization}

\han6{$\bullet $ \ similar triangles}

\yskip

\4Can {\AM} work in a new field,  like politics?\0 {\AM} can work in a  new
elementary, formalized  domain, if it  is fed a supplemental  base of
conceptual primitives for that domain.  To work in plane geometry, it
sufficed to give {\AM} about twenty new primitive concepts,  each with a
few parts filled  in. Other domains which {\AM} could  work in include
elementary mechanics, game-playing, and programming. 
The more informal (non-internally-formalizable) the desired field, the less of {\AM} that
is relevant. Perhaps an {\AM}-like system could be built for a constrained, precise
political task.\footnote{For example, such a politics-oriented {\AM}-like system
might conceive the notion of a group of political entities which
view themselves as quite disparate, but which are viewed from the outside
as a single unit: \eg, `the Arabs',  `the American Indians'.
Conjectures about this concept might include its reputation as a poor
combatant (and why). Many of the same facets {\AM} uses would carry over to represent
concepts in that new domain. }
Disclaimer: Even for a very small domain, the amount of
common-sense knowledge such a system would need is staggering.
It is unfortunate to provide such a trivial answer to such an important question,
but there is no easy way to answer it more fully until years of additional research
are performed.

\4Can {\AM} discover X? Why didn't  it do Y?\0 It is difficult to  predict
whether {\AM}  will (without modifications)  ever make a  specific given
discovery.  Although  its capabilities are small, its limitations are
hazy. What makes  the matter even  worse is that,  given a concept  C
which {\AM} missed discovering, there is probably a reasonable heuristic
rule which is missing from {\AM}, which would enable that discovery. One
danger of this  ``debugging'' is that a  rule will be added  which only
leads  to that  one desired  discovery, and  isn't good  for anything
else. 
In that case, the new heuristic rule would simply be an \4encoding\0
of a specific bit of mathematics which {\AM} would then appear to
discover using general methods.
This  must be  avoided at  all  costs, \4even  at the  cost  of
intentionally giving up a certain discovery.\0 If  the needed rule is
general---it has  many applications  and leads to  many interesting
results---then it really was  an oversight not to include it in  {\AM}.
Although I believe  that there are not too many  such omissions still
within  the  small  realm {\AM}  explores,  there is  no  objective  way to
demonstrate that, except by further long tests with {\AM}.

\4In what  ways are new  concepts created?\0  Although the answer  to
this  is   accurately  given  in   Section  4.3
(namely, this  is  mainly the  jurisdiction  of the  right  sides  of
heuristic rules),  and although I  dislike the simple-minded  way it
makes  {\AM} sound, the list  below does characterize the  major ways in
which new concepts get born:

\yskip

\noindent $\bullet \ $ Fill in examples of a  concept (\eg, by instantiating or running its
definition)

\noindent $\bullet \ $ Create a  generalization of a  given concept (\eg, by  weakening its
definition)

\noindent $\bullet \ $ Create a  specialization of a given concept (\eg, by restricting its
domain/range)

\noindent $\bullet \ $ Compose  two   operations  f,g,   thereby  creating  a   new  one   h.
[Define h(x)$=↓{df}$f(g(x))]

\noindent $\bullet \ $ Coalesce an operation f into a new one g. 
[Define g(x)$=↓{df}$f(x,x)]

\noindent $\bullet \ $ Permute the order of the arguments of an operation. 
[Define g(x,y)$=↓{df}$f(y,x)]

\noindent $\bullet \ $ Invert an operation [g(x)=y  iff f(y)=x] (\eg, from Squaring, create
Square-rooting)

\noindent $\bullet \ $ Canonize one  predicate P1  with respect  to a  more general  one  P2
[create  a new  concept  f,  an  operation, such  that:  P2(x,y)  iff
P1(f(x),f(y))]

\noindent $\bullet \ $ Create  a new operation  g, which is  the repeated  application of an
existing operation f.

\noindent $\bullet \ $ The usual logical  combinations of existing  concepts x,y:  
$x\meet y,\ x\join y,\ \hook x,\ldots$, etc.

\yyskip

Below  is  a similar  list,  giving  the  primary  ways in  which  {\AM}
formulates new conjectures:

\yskip

\noindent $\bullet \ $ Notice that concept C1 is really an example of concept C2

\noindent $\bullet \ $ Notice   that   concept   C1   is   really   a   specialization  (or:
generalization) of C2

\noindent $\bullet \ $ Notice that C1 is equal to C2; or: \4almost always\0 equal

\noindent $\bullet \ $ Notice that C1  and C2 are  related by some  known concept

\noindent $\bullet \ $ Check and update the domain/range of an existing operation

\noindent $\bullet \ $ If two concepts are analogous, extend the analogy to their conjectures as well


 \SSSEC{Current Limitations}

Below are several shortcomings of {\AM}, which hurt its behavior but are
not believed to be
inherent  limitations of its design. They  are presented in order
of decreasing severity.

Perhaps the most  serious limitation on  {\AM}'s current behavior  arose
from the  lack of constraints  on left  sides of heuristic  rules. It
turned  out that this excessive  freedom made it  difficult for {\AM} to
inspect and analyze  and synthesize its  own heuristics; such a  need
was not foreseen at the time {\AM} was designed.  It was thought that the
power to manipulate heuristic rules  was an ability which the  author
must have, but which the system wouldn't require.   As it turned out,
{\AM} did  successfully develop new concepts  several levels deeper than
the ones it  started with. But  as the new  concepts got further  and
further  away  from those  initial  ones,  they  had fewer  and  fewer
specific  heuristics filled in (since they had  to be filled in by {\AM}
itself).  Gradually, {\AM} found itself relying on heuristics which were
very  general compared  to the  concepts it  was dealing  with (\eg,
forced to use  heuristics about Objects  when dealing with  Numbers).
Heuristics for  dealing with  heuristics do  exist, and their  number
could  be  increased.  This  is  not  an  easy  job:  finding  a  new
meta-heuristic is a tough process.  Heuristics are rarely more than
compiled hindsight; hence it's difficult to create new ones ``before the fact.''
Our current research is based on the hypothesis that heuristics
can be discovered and reasoned about in much the same way that
mathematical operations can.  What this means is recoding each
of {\AM}'s heuristics as a full-fledged concept.  This would
enable the heuristics to work on each other, as well as on
concepts corresponding to mathematical entities.


{\AM}  has  no  notion  of proof,  proof  techniques,  formal  validity,
heuristics  for finding counterexamples,  etc.  Thus  it never really
establishes any conjecture formally.  This could be partially remedied
by  adding  about  25 new  concepts  (and  their  100 new  associated
heuristics) dealing with such topics.  The needed concepts have  been
outlined on paper, but not yet coded.
It would
probably require a few hundred hours to code and debug them.
[Lakatos 76] demonstrates the value of proof as an instrument of
discovery:  As one spirals in from conjecture to theorem, along the
path of repeated criticism and improvement, one often finds
motivation for modifying concepts' definitions, or for defining
new, promising ones.

The  user  interface is  quite  primitive, and  this  again  could be
dramatically improved with just a  couple hundred hours' work.   {\AM}'s
explanation  system  is  almost  nonexistent: the  user  must  ask  a
question  quickly, or {\AM} will have  already destroyed the information
needed to  construct  an  answer. A  clean  record of  recent  system
history and a nice scheme for tracking down reasons 
for modifying old rules and adding new ones dynamically
does not exist at
the level which is found, \eg, in MYCIN [See part I of this book].  
There is no trivial way to
have  the system  print  out  its heuristics  in  a format  which  is
intelligible to the untrained user.

An  important  type of  analogy which  was  untapped by  {\AM}  was that
between heuristics.  If two situations were similar,  conceivably the
heuristics useful  in one situation  might be useful (or  have useful
analogues)  in the new  situation
(see [Koppelman 75]).  Perhaps  this is a  viable way of
enlarging the  known heuristics.   Such ``meta-level'' activities  were
kept  to a minimum  throughout {\AM},  and this proved  to be  a serious
limitation. 

The idea of ``Intuitions'' facets was a flop.
Intuitions were meant to model reality, at least little pieces of it,
so that {\AM} could perform  (simulate) physical experiments, and observe
the results. The major problem here was that \4so\0 little of the world was
modelled that the only relationships derivable were those foreseen by the
author. This lack of generality was unacceptable, and the intuitions were
completely excised. The original idea might lead somewhere if it were
developed fully. 

 \SSSEC{Limitations of the Agenda scheme}


Currently, it is difficult to include heuristics which  interact with
one another  in any significant  way. The  whole fibre of  the Agenda
scheme assumes perfect independence of heuristics. The global formula
used to  rate tasks on  the agenda  assumes perfect superposition  of
reasons:  there are  no  ``cross-terms.''   Is  this assumption  always
valid? Unfortunately \4no\1, not even  for the limited domain {\AM}  has
explored.  Sometimes, two reasons are very similar: ``Examples of Sets
would permit  finding examples of Union'' and  ``Examples of Sets would
permit finding  examples of Intersection.''  In that  case, their  two
ratings shouldn't cause  such a big increase in  the overall priority
value of the task ``\6Fillin examples of Sets\1.''

\yskip

\noindent There are a few other minor limitations of the current Agenda scheme:

\yskip

(i) Sometimes, a heuristic rule will want to \4dissuade\0 the system from
some activity; so occasionally we'd like a  \4negative\0 numeric contribution to  a task's
priority  value.

(ii) {\AM} discards the formula which produced a reason's rating, and
keeps around only the numeric result of that formula; but many of
those formulae involve time-dependent quantities (\eg, the number
of entries on a particular facet of a particular concept).

(iii) Better than a single priority value for each  task would be
for {\AM} to maintain a vector of  numbers, or, better still, symbolic
descriptions.

(iv) The  agenda  list should  really  be an  agenda  \4tree\0
(maybe an agenda {\it Heap}),
since  the
ordering of  tasks is  really just  partial, not  total. 
The utility of this is extremely high if multiprocessor
technology is available to run the program on.

 \SSSEC{Limiting Assumptions}

{\AM} only ``got off the ground'' because a number of sweeping assumptions
were made, pertaining to what could be ignored, how a complex process
could be  adequately simulated,  etc.   Now that  {\AM} \4is\0  running,
however, those  same simplifications  crop up  as limitations  to the
system's  behavior.   Each of the  following points  is a `convenient
falsehood'.  Although the reader has already been told  about some of
these, it's worth listing them all together here:

\yskip

$\bullet \ $ The  only communication necessary from  {\AM} to the  user is keeping
the user informed of what {\AM} is doing. No natural language ability is
required by {\AM}; simple template instantiation is sufficient.

$\bullet \ $  The only  communication from  the  user to  {\AM}  is an  occasional
interrupt, when the user wishes to provide some guidance or to pose a
query.  Both of these can  be stereotyped and passed easily through  a
very narrow channel.\footnote{For example, a set of escape characters, so $\up W$ means
``{\it Why  did you  do that?},''  $\up U$ means ``{\it Uninteresting! Go  on to
something else},'' etc. }

$\bullet \ $ Each heuristic has  a well-defined domain of  applicability, which
can be specified just by giving the name of a single concept.

$\bullet \ $ If  concept C1 is more  specialized than C2,  then C1's heuristics
will be more powerful  and should be  executed before C2's  (whenever
both concepts' heuristics are relevant).

$\bullet \ $ If h1 and h2 are two heuristics attached  to concept C, then it is
not necessary to spend any time ordering them.

$\bullet \ $  Heuristics superimpose  perfectly;  they never  interact strongly
with each other.

$\bullet \ $ The reasons supporting a task can be mere tokens; it suffices to
be able to inspect them for equality. They need not follow a constrained
syntax. The value of a reason is adequately 
characterized by a unidimensional
numeric rating. 

$\bullet \ $ The reasons supporting a task superimpose perfectly; they never interact
with each other. 

$\bullet \ $ Supporting reasons---and their ratings---never change
with time, with one exception: the ephemeron `Focus of attention'.

$\bullet \ $ It doesn't matter in what order the supporting reasons for a task 
were added.

$\bullet \ $ There is no need for negative or inhibitory reasons, which would decrease the
priority value of a task.

$\bullet \ $ At any  moment, the top few  tasks on the  agenda are not  coupled
strongly;  it is  not necessary  to expend  extra processing  time to
carefully order them.

$\bullet \ $ The  tasks on the agenda are completely independent of each other,
in the sense of one task `enabling' or `waking-up' another. 

$\bullet \ $  Mathematics research  has  a  clean,  simple model  (see  Section
7.3.5)
which indicates  that it  is a  search process  governed  by a  large
collection of heuristic rules.

$\bullet \ $ Elementary mathematics  is such that valuable new concepts will be
discovered fairly regularly.   

$\bullet \ $ The worth of  each new concept can  be
estimated easily,  after just a brief  investigation.  

$\bullet \ $ Contradictions
will arise extremely infrequently,
and it is not disastrous to ignore them when they do occur.
The same indifference applies to the danger of believing in false conjectures.

$\bullet \ $ When doing theory formation in elementary mathematics, 
proof and formal reasoning are dispensable.

$\bullet \ $ Even as more knowledge  is obtained, the set of facets  need never
change.

$\bullet \ $ For any piece of knowledge sought  or obtained, there is precisely
one  facet of one existing\footnote{The only allowable exception  is that a
new piece of information  might require the  creation of a brand  new
concept,  and then  require storage  somewhere  on that  concept.   }
concept  where that knowledge ought  to be stored, and  it is easy to
determine that proper location.

$\bullet \ $ Even as more concepts are defined, the body of heuristics need not
grow much.

$\bullet \ $ Any common-sense knowledge required by {\AM} is automatically present
within  the   heuristic   rules.  So,   \eg,   no  special   spatial
visualization abilities are needed.

\yskip

\noindent It is worth repeating here that the above assumptions are all clearly
{\bf false}. Yet  none of them was too damaging to {\AM}'s behavior, and their combined
presence made the creation of {\AM} feasible.

 \SSSEC{Choice of Domain}

\qq{The genesis of mathematical creation is a problem which should intensely
interest the psychologist. It is the activity in which the
human mind seems to take least from the outside world,
in which it acts or seems to act only of itself and on itself, so that
in studying the procedure of mathematical thought we may hope to
reach what is most essential in man's mind.}{Poincar\'e}


Here are some questions this subsection will address:

\yskip

\noindent $\bullet \ $ What are  the inherent  limitations---and
advantages---in fixing a domain for {\AM} to work in?

\noindent $\bullet \ $ What characteristics are favorable  to automating research in
any given domain?

\noindent $\bullet \ $ What are  the  specific reasons  for and  against  elementary
finite set theory as the chosen starting domain?

\yskip

Research in  various domains  of science  and math  proceeds slightly
differently.   For  example, psychology  is interested  in explaining
people, not in creating new  kinds of people. Math is  not interested
in individual entities so much as in new kinds of entities. There are
ethical restrictions on physicians which prevent certain  experiments
from being  done. Political  experiments rarely permit  backtracking,
etc.  Each field has its own peculiarities.

If we  want a system to work in many  domains, we have to sacrifice
some power.\footnote{This  is assuming a  system of a  given fixed size.  If
this restriction  isn't present, then a  reasonable ``general-purpose''
system  could  be  built  as  several  systems  linked  by one  giant
switch.}. Within a given field of knowledge (like math),  the finer the
category we limit ourselves  to, the more specific are the heuristics
which become available.   So  it was  reasonable to  make this  first
attempt limited to one narrow domain.

This brings  up  the choice  of domain.  What  should it  be? As  the
DENDRAL  project illustrated so clearly,\footnote{See [Feigenbaum et. al. 71]. 
In that case, the choice of subject was  enabled by [Lederberg 64].  }
choice  of  subject  domain  is quite  important  when  studying  how
researchers  discover and  develop their  theories.   Mathematics was
chosen as the domain of this investigation, because

\BN

\yskip

\hh  In doing  math research, one  needn't cope with the  uncertainties
and  fallability  of  testing   equipment;  that  is,  there  are  no
uncertainties in the data (compared to, \eg, molecular structure inference
from mass spectrograms).

\hh  Reliance on  experts' introspections is  one of the  most powerful
techniques  for  codifying the  judgmental criteria  necessary  to do
effective work in a field;  I personally have had enough training  in
elementary mathematics  so that I  didn't have to rely  completely on
external  sources for  guidance in formulating  such heuristic rules.
Also,  several  excellent  sources  were  available   [P\'olya,  Skemp,
Hadamard, Kershner, etc.].

\hh  The more formal a science is, the  easier it is to automate. For a machine to
carry out research in  psychology would require  more knowledge  about
human information  processing than now  is known,  because psychology
deals with entities as complex as you and I. Also, in a formal science,
the \4languages\0  to  communicate  information can  be  simple  even
though the \4messages\0 themselves be sophisticated.

\hh   Since mathematics  can deal  with any  conceivable constructs,  a
researcher there is not limited to explaining observed data.  Related
to this is the freedom to investigate---or to give up on---whatever
the researcher  wants to. There is  no single discovery  which is the
``goal,'' no given problem to solve, no right or wrong behavior.

\hh  Unlike ``simpler'' fields, such as propositional logic, there is  an
abundance of heuristic rules available for the picking.

\yskip

The limitations of math as a domain  are closely intertwined with its
advantages.   Having no  ties to real-world  data can be  viewed as a
limitation, as can having no  clear goal. There is always the  danger
that  {\AM} will  give up  on each  theory as  soon as  the first  tough
obstacle crops up.

Since  math (and in particular geometry and number theory)
has been worked  on for millenia by  some of the greatest
minds from  many  different cultures,  it is  unlikely  that a  small
effort  like  {\AM}  would make  any  new  inroads,  have any  startling
insights.  In that respect, Dendral's space was much less explored.
Of course
math---even at the elementary level that {\AM} explored it---still has
undiscovered gems (\eg, the recent unearthing of Conway's numbers [Knuth 74]).

One point of agreement between Weizenbaum and Lederberg\footnote{See
the quote at the front of the next subsection. It is from [Lederberg 76],
a review   of  [Weizenbaum 76].}
is that AI can succeed in automating an activity only
when a  ``strong theory'' of  that activity  exists. {\AM}  is built on  a
detailed  model  of  how  humans  do  math research.    In  the  next
subsection, we'll discuss the model of math research that {\AM} assumes.

 \SSSEC{Limitations of the Model of Math Research}


\qq{Weizenbaum does point to projects in mathematics and
chemistry where computers have shown their potential for
assisting human scientists in solving problems. He correctly
points out that these successes are based on the existence of
``strong theories'' about their subject matter. }{Lederberg}

{\AM}, like  anything else in  this world, is  constrained by a  mass of
assumptions. Most of these are ``compiled'' or interwoven into the very
fabric of {\AM},  hence can't  be tested by  experiments on {\AM}.   
Some of these were just discussed a few pages ago, in Section
7.2.4.

Another body of assumptions exists.
{\AM}  is
built around  a particular  model of  how mathematicians  actually go
about   doing  their   research.     This  model  was   derived  from
introspection, but can be  supported by quotes from  P\'olya, Kershner,
Hadamard, Saaty, Skemp, and many others.  No attempt will be made to justify
any of these premises.  Here is a simplified summary  of
that information processing model for math theory formation:

\yyskip

\ctrline{{\2MODEL OF MATH RESEARCH}}

\yskip

\BN

\hh  The order in which a math textbook presents a theory is almost the
exact opposite  of the order in which  it was actually discovered and
developed.  In a text, new  definitions are stated with little or  no
motivation, and they turn out to be just the ones needed to state the
next big  theorem, whose proof then magically appears. In contrast, a
mathematician  doing   research  will   examine  some   already-known
concepts, perhaps trying to find some regularity in experimental data
involving them. The patterns he  notices are the conjectures he  must
investigate further, and these relationships directly motivate him to
make new definitions.


\hh   Each step  the  researcher takes  while developing  a  new theory
involves choosing from  a large set of  ``legal'' alternatives---that
is,  searching.   The  key to  keeping this  from  becoming a  blind,
explosive  search  is the  proper  use of  evaluation  criteria. Each
mathematician uses his own  personal heuristics to choose  the ``best''
alternative available at each moment---the one most plausible to him.

\hh    Non-formal   criteria   (aesthetic  interestingness,   inductive
inference from  empirical evidence,  analogy, and  utility) are  much
more  important   than   formal  deductive   methods  in   developing
mathematically   worthwhile   theories,   and   in  avoiding   barren
diversions.

\hh  Progress in \4any\0  field of mathematics demands much  non-formal
heuristic  expertise  in  \4many\0  different  ``nearby''  mathematical
fields.  So a  broad, universal  core of  knowledge must  be mastered
before any single theory can meaningfully be developed.

\hh  It is sufficient (and  pragmatically necessary) to have and  use a
large  set  of  informal  heuristic  rules. These  rules  direct  the
researcher's next activities, depending  on the current situation  he
is  in.   These rules  can  be assumed  to  superimpose ideally:  the
combined  effect of several rules  is just the sum  of the individual
effects.

\hh  The  necessary  heuristic rules  are  virtually  the same  in  all
branches of mathematics,  and at all levels of  sophistication.  Each
specialized  field will  have some of  its own  heuristics; those are
normally much more powerful than the general-purpose heuristics.

\hh  For true understanding, the researcher should  grasp\footnote{Have access
to,  relate to,  store,  be able  to  manipulate, be  able to  answer
questions about  }  each  concept  in  several  ways:  declaratively,
abstractly, operationally,  knowing  when it  is relevant,  and as  a
bunch of examples.

\hh  Common metaphysical  assumptions about nature and  science: Nature
is   fair,  uniform,  and   regular.     Coincidences  have  meaning.
Statistical considerations  are valid  when looking  at  mathematical
data.   Simplicity and  symmetry and  synergy are  the rule,  not the
exception.

\yskip

 \SSSEC{Ultimate powers and weaknesses}

Consider now  \4any\0 system which  is consistent with  the preceding
model  of math  research, and  whose orientation  is to  discover and
develop new (to the system) mathematical theories.  This  includes {\AM}
itself, but  might also include  a bright high-school senior  who has
been taught a large body of heuristic rules.

What  can such  systems ultimately  achieve? What are  their ultimate
limits?    Answers  to  ultimate  questions  are  hard   to  come  by
experimentally,  so  this  discussion  will be  quite  philosophical,
speculative, and short.  The model  of math research hinges around  the
use of heuristic rules for guidance at all  levels of behavior. It is
questionable  whether  or  not  all  known  mathematics could  evolve
smoothly in this way.   As a first  order fixup, we've mentioned  the
need to  provide good meta-heuristics,  to keep enlarging the  set of
heuristics.   If this is not  enough (if meta-meta-$\ldots $-heuristics are
needed),  then  the model  is  a  poor  one  and  has  some  inherent
limitations.\footnote{If  Ptolemy had had access to  a digital computer, all
his data could have been made to fit (to any desired accuracy),  just
by computing epi-cycles,  epi-epi-cycles,$\ldots $ to the needed  number of
epi's. We in AI must constantly be on guard against that error.  } If
some discoveries  can only be  made non-rationally (by random  chance, by
Gestalt, etc.)
then any such system would be incapable of finding those concepts.

Turning aside  from math,  what about  systems whose  design---as a
computer  program---is  similar to  {\AM}?\footnote{Having an  agenda of tasks
with reasons and reason-ratings  combining to form a  global priority
for  each  task,  having  
units/\-modules/\-frames/\-Beings/\-Actors/\-scripts/\-concepts
which have parts/\-slots/\-aspects/\-facets, etc.  Heuristic rules are tacked  onto
relevant concepts,  and  are executed  to produce  new concepts,  new
tasks, new facet  entries. } Building such systems will be ``fun,'' and
perhaps will result in new discoveries in other fields.   Eventually,
scientists (at  least in a few  very hard domains) may  relegate more
and  more of their  ``hack'' research  duties to {\AM}-like  systems.  The
ultimate limitations  will  be those  arising  from incorrect  (\eg,
partial)  models of  the  activities the  system must  perform.   The
systems themselves may  help improve these  models: experiments  that
are performed  on the systems  are actually  tests of the  underlying
model;  the results might  cause revisions to  be made  in the model,
then in the system, and the whole cycle would begin again.

\SSEC{Final Conclusions}

Before quitting, let's summarize what's  worth remembering about this
research project.

\yskip


\noindent $\bullet $  It is  a demonstration that  a few  hundred general heuristic
rules suffice to  guide an automated math  researcher as it  explores
and expands a  large but incomplete knowledge base  of math concepts.
{\AM}  serves as a living existence proof  that creative research can be
effectively modelled as heuristic search.

\yskip


\noindent $\bullet $  The book also introduces a control structure based upon an
agenda  of  small research  tasks,  each with  a  list of  supporting
reasons attached.

\yskip


\noindent $\bullet $  The  main limitation  of {\AM}  was its  inability to  synthesize
powerful new heuristics for the new concepts it defined.


\yskip

\noindent $\bullet $  The main successes  were the few novel ideas it  came up with,
the ease  with which a new task domain was  fed to the system, and---most
importantly---the overall  rational sequences  of behavior  {\AM}
exhibited.

\yskip


\noindent $\bullet $  The greatest long-range importance  of {\AM} may well lie in  the
body of  heuristics assembled, either  as the
seed for  a huge base of experts' heuristics, or as a new orientation
for mathematics education.

\yskip